Air Quality Predictions in Beijing

Import Files & Modules

Display tables

Find the nearest stations using KNN

Data preprocessing

Preprocess the Air Quality Station Data

Preprocess Grid Weather Data

Preprocess Observed Weather Data

Concat the weather features from Grid Weather Station & Observed Weather Station

Outliers

Temperature
Pressure
humidity
Wind Direction
Wind Speed

Concat features from both Grid & Observed Weather Station

Concat the weather features for each Air Quality Station

  1. From the above information, we can see that the shared time period is from 2017-01-30 16:00:00 to 2018-05-02 23:00:00. Thus, we will choose the features of this time period for training.
  2. The weather features consist of K nearest neighbors weather stations' features.

Missing Data

Analysis

Imputation

# impute all the air quality station dataframe for key, value in airQ_data_dict.items(): # change the 'time' data type from 'object' to 'datetime' value.time = value.time.apply(lambda x: pd.Timestamp(x)) imputed_df = imputeWeatherData(value) imputed_df.to_csv('D:/Akhila/Air pollution data set/imputed_data/{}_imputed.csv'.format(key.replace('.csv', ''))) #atzx_impute.to_csv('imputed_data/atzx_imputed.csv')

Training data & Testing data

import cleaned data